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Abstract 

Background: The miRNAs are small non-coding RNAs of roughly 22 nucleotides in length, which can bind with and 
inhibit protein coding nnRNAs through connplennentary base pairing. By degrading nnRNAs and repressing proteins, 
miRNAs regulate the cell signaling and cell functions. This paper focuses on innovative mathematical techniques to 
model gene interactions by algorithmic analysis of microarray data. Our goal was to elucidate which mRNAs were 
actually degraded or had their translation inhibited by miRNAs belonging to a very large pool of potential miRNAs. 

Results: We proposed two chemical kinetics equations (CKEs) to model the interactions between miRNAs, mRNAs 
and the associated proteins. In order to reduce computational cost, we used a non linear profile clustering method 
named minimal net clustering and efficiently condensed the large set of expression profiles observed in our 
microarray data sets. We determined unknown parameters of the CKE models by minimizing the discrepancy 
between model prediction and data, using our own fast non linear optimization algorithm. We then retained only the 
CKE models for which the optimized fit to microarray data is of high quality and validated multiple miRNA-mRNA pairs. 

Conclusion: The implementation of CKE modeling and minimal net clustering reduces drastically the potential set of 
miRNA-mRNA pairs, with a high gain for further experimental validations. The minimal net clustering also provides 
good miRNA candidates that have similar regulatory roles. 
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Background 

Transcriptional and translational processes are funda- 
mental cell mechanisms, involving three main molecular 
species: messenger RNA (mRNA) and their associated 
proteins, as well as microRNAs (miRNAs). 

The miRNAs are small non-coding RNAs of roughly 22 
nucleotides in length, which can bind with and inhibit 
protein coding mRNAs through complementary base 
pairing. A given miRNA can potentially bind and silence 
hundreds of mRNAs across a number of signaling path- 
ways. These repressive miRNA-mRNA interactions occur 
in multiple cellular processes [1-3], and involve two dis- 
tinct modalities: they may directly degrade their target 
mRNAs, or more often inhibit their translation [4-9]. 

The best characterized features determining the tar- 
gets of a specific miRNA are the conserved Watson-Crick 
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pairing to the 5' region (positions 2-7) of the miRNA, 
which are the so-called "seed pairing rules" [3,10-13]. 
Since seed pairing rules are neither sufficient nor neces- 
sary for miRNA- target functions [4,14], they have usually 
been combined with microarray or proteomic analysis 
to find potential miRNA- target pairs [15- 17]. Classical 
microarray data analysis relies mostly on massive appli- 
cation of correlation analysis and linear statistical tech- 
niques to simultaneously acquired gene expression pro- 
files. Combined with profile thresholding and heat map 
displays, these techniques provide commonly used clues 
for qualitative inference. 

In [18], Principal Component Analysis and linear corre- 
lation had been linked with comparative sequence anal- 
ysis to study microarray data recorded during mouse 
stem cells differentiation, and to broadly predict potential 
miRNA-mRNA interactions. 

To go beyond the results of the linear microarray anal- 
ysis applied on time-course microarray data in [18], we 
have formalized two basic architectures for repressive 
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miRNA-mRNA interactions: "Transcription Degradation" 
(TD) and "Translation Inhibition" (TI). 

Traditional chemical kinetics equations had been pro- 
posed to model the transcriptional and translational 
processes without involving the interaction of miRNAs 
[19,20]. 

We have derived Chemical Kinetics Equations (CKEs) 
to model the dynamics of TD and TI motifs, in the 
spirit of [20-30] . The equations are algebraically invariant 
under affine transformation and allow data condensa- 
tion to reduce computational cost. We have implemented 
"minimal net" clustering method, which can control the 
maximum diameter of the clusters, to condense large data 
sets of gene expression. 

Modeling by nonlinear CKEs involves complicated 
parameter estimation problems to fit the very large set 
of expression profiles recorded by microarrays. We have 
developed innovative fast algorithms dedicated to CKE 
parameter estimation, by optimization of the quality of fit 
between model and data. 

We validate only the parameterized motifs having a high 
quality of fit to data. To reach robust conclusions we apply 
a "parameter parsimony" principle, favoring the models 
having the smallest number of parameters. And we have 
also evaluated the robustness of our parameter estima- 
tion algorithm by algorithm by direct testing on simulated 
data. These tests did validate our parameter estimation 
method is quite robust. This nonlinear approach goes fur- 
ther than well-established analysis based on correlation 
techniques combined with heat map displays. 

Methods 

Basic interaction architectures 

To validate if an miRNA M does indeed repress a given 
gene G, we model the chemical interactions of M and G 
within a small network containing the pair (M, G). We 
now sketch two basic interaction architectures and their 
CKE models. 

Transcription Degradation Motifs (TD-motifs) 

We call "TD-motif" any interaction architecture, as 
sketched in Figure 1, involving a single miRNA-mRNA 



pair (M,G) where G is in Target(M), and M degrades the 
transcription of G. The TD motif includes also two sets 
of proteins rep(G) and act(G), namely the transcriptional 
"repressors" and "activators" of G, denoted by 

rep(G) = {Ri, . . .R/,} and act(G) = {^1,^2, . . 

Let g{t),p{t),m{t),ri(t)yaj(t), be the expression levels at 
time t for the chemical species G,P,M,RuAj, We model 
the transcription process by a CKE similar to CKEs pro- 
posed in [20,22,25,29,30], but with a complementary term 
encoding the repressive impact of miRNA M on its target 
mRNA G: 



dgit) 
dt 



= -Pgit) - vgit)m(t) + KREPit)[l -ACTit)] 

(1) 



where ^ > 0 is the degradation rate of G, v > 0 is the 
reaction rate between G and M, /c > 0 is the product 
of the transcription rate by the concentration c of DNA 
templates, concentration which we assume to be some 
constant not depending on time (see [20]). 

The percentage 0 < F(t) = REP(t)[l -ACT(t)] < 1% is 
the fraction of existing DNA templates which are commit- 
ted at time t to transcription of the mRNA gene G. Here 
the percentages REP(t) and ACT{t) are modeled by the 
following products. 



REP{t) = REP lit) X ... X REPqit) 
ACTit) = ACT lit) X ... X ACT kit) 



(2) 



where 



REPiit) 



ACT jit) = 



1 



(1 + Uirm^^i 
1 



(3) 



(1 + Wjajit))^^j 



The parameters SRi > 0, Ui > 0 and SAj > 0, wj > 0 are 
the number of binding sites and the affinity constant for 
the transcriptional factors Ri and Aj. 

Note that the transcription repressors Ri combine mul- 
tiplicatively their individual impacts REPi in REP it), and 
that the REPi are analogous to Hill function (see [19,25]); 
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Figure 1 TD-motif and Tl-motif. 
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similar remarks apply to the transcription activators. The 
multiplicative expressions of REP{f) and ACT if) are typi- 
cal of a so-called "cis -regulatory" function and have been 
derived by J. Goutsias [20]. 

The term KREP{t) [1 —ACT{t)] dt is the concentration of 
new G molecules synthesized by transcription during the 
small time interval [t, t-\- dt], while the repressive interac- 
tions of M and G eliminates vg(f)m{f)dt molecules of G, 
and natural decay destructs Pg(t)dt molecules of G. 

Translation-Inhibition Motifs (Tl-motifs) 

We call "Tl-motif" any interaction architecture, as 
sketched in Figure 1, involving a set Mi, . . . , of miR- 
NAs inhibiting the translation of mRNA gene G, by 
repressing the expression of the protein P generated by 
G. Let (p(t),mi(t),g(t)) be the concentrations at time t 
of protein P, miRNA M^, and mRNA G. In the spirit of 
[20,21], we model the translation inhibition dynamics by 
the CKE 

= -yp(t) + Xg(t)H(t) (4) 

where y > 0 and k > 0 are the degradation rate and 
the translation rate for protein P and where H(t) is the 
percentage of G molecules committed at time t to the 
translation of G. Thus H(t) encodes the inhibiting impact 
of the miRNAs Afi, . . . , Af^ on the translation of G, and is 
modeled as a product of terms similar to REP(t): 

Hit) =Hi(t)x ...X Hkif) where 



{l + Uimi{t)f^i 

The parameters SMi > 0 and Ui > 0 are resp. the 
number of binding sites and the affinity constant control- 
ling the inhibiting impact of miRNA Mi on the translation 
of gene G. Note that H(t) decreases when the miRNA 
concentrations mi(t) increase. 

Our model for TI motif has been inspired by J. Goutsias 
[20], and we present below the hypotheses and arguments 
justifying the expression oiH{t), There is a key difference 
between the CKEs modeling TD and TI motifs. For TI 
motifs, the concentration of G-molecules committed to 
translation is g{t)H{t)y where g{t) is the concentration of 
G-molecules and//(^) < 100%. 

For TD motifs, the concentration of G-molecules syn- 
thesized by transcription is KF{t) = acF(t) with F(t) < 
100%, where a and c are respectively the transcription rate 
and the concentration of DNA templates, assumed to be 
constant in time. 

Derivation of chemical {nineties equations 

Our derivation of the regulation equation for TI motif 
is quite similar to presentation given for TD motifs in 



[20], but has several changes in assumptions and formu- 
lations. To derive the CKEs (1) and (4), we propose a few 
hypotheses. 

• Hyp. 1: (TD-motifs) The molecules of the miRNA 
repressor of gene G can strongly bind only at one 
unique specific site of G- molecules, and once a 
single such strong bind occurs, the corresponding G 
molecule degrades extremely fast. 

• Hyp. 2: (Tl-motifs) Each miRNA Mj in the set rep(G) 
of translation inhibitors of G can weakly bind with G 
but only at specific binding sites constituting a set 
BINDj of size Sj. The sets BINDi, BIND2, ... are 
pairwise disjoint. Once a G molecule thus binds with 
one inhibitor My, then this G molecule will fail to 
translate. 

• Hyp. 3: For any given G-molecule, call Xj the random 
number of sites in BINDj which actually bind with 
one "M/'-molecule. We assume that the random 
variables XifX2f are independent. 

Hyp.l is based on the fact that only a small fraction of all 
messenger RNAs have more than a single miRNA binding 
site and miRNA bound with an mRNA gene G has a lim- 
ited effect on the mRNA G, but affects more substantially 
the protein P generated by G ([4,31]. 

Consider first any TI motif involving an mRNA gene 
G generating the protein P , as well as a set repiG) = 
{Mi,M2, . . .M/^} of k translation inhibitors. We now 
derive the CKE (4) for this TI motif. Call p(t),g(t),mj(t) 
the concentrations of P, G, Mj and let H(t) be at time t 
the percentage of existing G molecules committed to the 
translation of G into P. The basic CKE driving translation 
of G into P is then, as seen in [20], 

^ = -yp(t) + Xg(t)H(t) (6) 

where y. A, are the degradation and translation rates of P. 

The main point is to compute H(t), 

For k = 0, there are no translation inhibitors, and hence 
H(t) = 100%. For k = 1, let Q[S] be the set of G molecules 
with exactly S binding sites bound to Mi molecules, where 
0 < S < Si. Let q(S,t) be the concentration of Q[S] 
molecules. We have the forward and backward reactions 

Q[S] +Mi Q[S + 1] and Q[S + 1] - Q[S] +Mi (7) 

By molecular collision theory [23], the concentration 
p(t) of free Mi molecules which bind at time t on Q[S]- 
molecules to produce Q[S + 1] -molecules by forward 
reaction 7, is proportional to the product of mi(t)q(S, t) 
by the number {Si — S) of vacant binding sites. Hence, for 
some constant c, 

p{t) = c{Si-S)mi{t)q{S,t) (8) 
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Similarly the concentration r (t) of Mi molecules freed by 
backward reaction 7 is given by 



T(t) = c(S-\-l)q(S-\-ht) 



(9) 



for some constant c. At chemical equilibrium, we have 
r(t) = p{t)y and hence 

q{S +l,t) = umi(t)-^—q{S, t) 

where w is a new constant. By recurrence on 5, this implies 

q{S. t) = Cl [umi (t)]' ^(0, t) where = ^^^^'^'^'^^^ 

(10) 

This entails 

Si Si 
git) = J2 ^(^' t) = q(0, t) [umit)f 

S=0 5=0 

= [l-^umi(t)f' q(0, t) 

The set of G-molecules committed to translation into P 
at time t is identical to q{Qy t). Thus H{t) = q(Oy t)/g(t), 
and (11) yields //(O = 1/[1 + umi{t)]^K 

Using hypothesis Hyp 3, a recurrence on k extends the 
argument just given for k = 1, to prove that for k > 1, the 
percentage H(t) of G molecules committed to translation 
into P is given by: 

1 1 



(11) 



H(t) = 



X ... X 



(12) 



which proves CKE (4) for TI motifs. 

For TD motifs, the cis-regulation function F(t) in CKE 
(1) has been derived in [20]. By using hypothesis 1 and 
molecular collision theory, we directly deduce that the 
concentration the concentration of mRNA G degraded by 
miRNA M is proportional to the product of both con- 
centrations of G and M. Thus we justify the miRNA 
degradation term —vg{t)m{t). 

Invariance by affine profile transformations 
Affine profile transformations 

Call r c the set of all possible expression level profiles 
r(t) indexed by time ^i, . . . , To any pair T = (a, b) of 
real numbers, we associate an affine profile transformation 

r ^ Tr defined by r{f) (ar(t) + b) 

for all time dates t Let T be the set of all such affine profile 
transformations . 

In microarray data sets, expression levels of genes are 
recorded via optical analysis of fluorescence intensities, 
and hence depend strongly on experimental acquisition 
modalities. So relative expression levels between pairs of 
recorded chemical species are more meaningful quan- 
tities, and graphic displays of microarray data by "heat 
maps" often involve logarithms of raw data. 



Our microarray data record expression levels which as 
a first approximation can be viewed as unknown affine 
transformations of concentrations. Since our CKEs (1), (4) 
were derived for concentrations, we need to check how 
these CKEs and their parameters change under generic 
affine profile transformations. 

Affine invariance of CKE models 

Consider first a TD-motif involving an mRNA gene G, 
a protein P, an miRNA M, transcription repressive pro- 
teins 7?i,7?2) . . .> and transcription activating proteins 

Ai,A2, The CKE model (1) links the concentrations 

gfff nif ru aj of G, M, P, Ru Aj. Let J^,^, m, r^, aj be the corre- 
sponding recorded expression profiles. Assume that each 
such recorded profiles r is linked to the concentration 
profile 4 by some unknown affine profile transformation 

From CKE (1), one directly deduces that gfp,m,ruaj 
verify a new CKE having an algebraic form completely 
similar to CKE (1), but where the original parameters 
y6, V, /c, uu wj are replaced by new parameters v, /c, wy, 
and where the integers S7?i,Si?2> • • and SAi,SA2, . . ., 
remain unchanged. 

The new parameters are easily expressed in terms of the 
original ones and of the coefficients of the affine transfor- 
mations, but this is irrelevant practically since we will use 
microarray data to directly compute the new parameters 
for a CKE of type (1) linking recorded expression levels. 

Similar computations for Tl-motifs show that this affine 
invariance property also holds for CKE (4). Hence to 
model a TD or a TI motif, we can fit a CKE model of type 
(1) or (4) to recorded expression profiles, even though the 
theoretical model justification involved true concentra- 
tions, which are not directly measured by microarrays. 

The key assumption is that, for each chemical species 
C, the expression levels c(t) of C recorded by microarray 
are approximately linked to the concentration c(t) of C by 
some affine relation c(t) = a(C)c(t) + Z?(C), where the 
unknown coefficients a(C) and b(C) may depend on the 
species C. 

The preceding algebraic model invariance under mul- 
tiple affine profile transformations strongly suggests that 
adequate distances between dynamic profiles of recorded 
expression levels should be invariant under affine profiles 
transformations, as developed in the next section. 

Condensation of expression levels profiles 

Microarray data typically record several tens of thousands 
gene expression profiles. So computational costs to fit 
microarray data to all potential TD-motifs and Tl-motifs 
would of course be prohibitive. A natural option to reduce 
combinatorial explosion is to cluster the observed profiles. 

Since our goal is to model expression profiles by ODEs 
of type (1) and (4), we need to control the diameters of all 



Luo et al. BMC Systems Biology 20 1 4, 8: 1 9 
http://www.biomedcentral.eom/1 752-0509/8/1 9 



Page 5 of 1 5 



clusters of expression profiles. This led us to reject hier- 
archical clustering as well as K-means clustering, and to 
implement in the space of expression profiles a "minimal- 
net" clustering technique, inspired by an innovative tech- 
nique for automatic generation of prototypes in shape 
spaces (see [32]). In view of the preceding section, the 
diameters of clusters in the space of profiles should be 
measured by a distance invariant under affine profiles 
transformations. 

Affine invariant distance between profiles 

Recall that F c is the set of all profiles r(t) indexed by 
time dates ti,...ytq. The mean and variance of a profile r 
are denoted by 

f = -[r(ti) + . . . + r(tq)] ; 

var(r) = -[(r(ti) -rf + ... + (r(tq) - rf] 

Define normalized profiles nor(r) and profiles correla- 
tions corrifiy r2) by 

nor{r){t) = ; 

^varir) 

corr(r\,r'i) =< nor(ri)y nor(r2) > 

where < •, • > is the usual scalar product in RP, 

We then define a distance D(ri, r2) between profiles ri 
and r2 by 

D{ri,r2) = V2-2corr(ri,r2)) 

For any affine profile transformation T in T , 
one has nor{r) = nor(Tr), and hence D(ri,r2) = 
D{nor{ri)y nor{r2). Thus for any affine profile transforma- 
tions Ti, r2 , and any profiles ri, r2 one has 

D{Tiri,T2r2)=D{ri,r2) 

So the profile distance D is invariant by affine profile 
transformations 

Minimal net clustering 

The set T of all profiles is now endowed with a distance D 
invariant by affine profile transformations. Call mPR the 
large set of miRNA profiles recorded at time ^i, . . . , by 
our microarrays. We fix a maximum radius s for profiles 
clusters. We seek to partition mPR into disjoint clusters 
CLi, . . . , CLr such that each CLj has diameter inferior to 
2s, and we also want the number NN of clusters to be as 
small as possible. We have implemented an iterative algo- 
rithm to generate this type of minimal net clustering, in 
the spirit of [32]. 

Define a distance function D{x,y), where x,y repre- 
sent the observations of the data. Denote by D{x, Y) the 



distance between an observation x and a set Y, where 
D{x,Y) = minygy Let Let [x] denote the clus- 

ter containing single point x, and F be the set of all 
observations, the minimal net algorithm is as follows: 

1. step 1, let (xi.yi) = argmax^yD(x,y). Then let xi 
and yi become the representatives of two initial 
clusters. Let CLi =[xi], CL2 =[x2]y 

Ci = {CLiy CL2} andi^i = F \ Ci representing the 
remaining points in F excluding the 2 clusters. 

2. After step n — 1, we obtain 

Cn-i = {CLi, CL2, . . . CLn} and Rn-i = ^ \ Cn-i, 
where CLj is a cluster that has single point, 
; = !,...,« + L 

In step n, let HD = mdiXxeRn-i C^-i), 
representing the maximum distance between 
observation x in Rn-i and set Cn-i - liHD > s, find 
xhd = argmax^^j^^^D(x, C^_i). Let 
Cn = {Cn-i,XHD}^ Rn = ^ \ Repeat until //D < s. 

3. Assume the loop stop at step NN, we have 

CjVN = {CLi, CL2, . . . CLjVN+i}. For an observation 
X, find the point ycLk> belonging to cluster CL/^ in 
Caqv, that is closest to x, i.e. 

ycLk = ^^S^^^yeCNN^^^^' Cnn)), then assign x to 
the cluster CL^. 

We apply this minimal net clustering algorithm to the 
set of all miRNA profiles recorded by our microarrays. We 
define the cluster diameter diam(CL) of cluster CL as the 
maximum distance of any two observations in the cluster, 
i.e. diamiCL) = m2iXx,yeCL ^fej)? where. Compared with 
the commonly used clustering method such as K-means 
and Hierarchical clustering, the minimal net algorithm 
allows us to control the diameter of all clusters by set- 
ting the threshold s representing the supremum radius of 
the cluster, while the K-means and hierarchical clustering 
is often used to determine the number of of clusters but 
not their diameters. If we select a very small threshold 
the expression levels of genes in the same cluster can be 
considered almost identical. 

Parameter estimation for the CKE models 
Parameters estimation strategy 

A generic strategy for parameter estimation in CKEs sys- 
tems is to minimize a cost function evaluating the discrep- 
ancy between model predictions and experimental data. 
Each one of our CKE models has a single output vari- 
able, namely the expression level g(t) of mRNA gene G 
for a TD motif, and the expression level p(t) of protein P 
for a TI motif. The output variable g(t) or p(t) of a CKE 
model can be estimated by a function or p(t) once one 
knows the profiles of all other molecular species involved 
in the model. The estimators g(t) or p(t) are respectively 
determined by CKE (1) or CKE (4). 
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Each CKE models is parameterized by a parameter 
vector w of dimension ripar- The quality of fit of this 
model with recorded profiles data is quantified by the size 
ERR(w) of the estimation error defined as follows 

ERR(y,) = m^x\f{t) -f(t)\ 

where the output variable f(t) is equal to g(t) for TD 
motifs and to p(t) for TI motifs. The concrete goal is to 
find the best parameter vector w by minimization of the 
lack of fit ERR(w) over all possible values of w. 

There are no magic solutions for such non linear mini- 
mization problems. Moreover fast computing was essen- 
tial here, since we usually have to solve a very large 
number of similar "quality of fit maximization" problems 
when dealing with large microarray datasets. 

We have tested several generic cost minimization 
approaches (see [33-35]) such as "genetic algorithms", as 
well as "gradient descent" to minimize a sum of squared 
modeling residuals. These two techniques turned out to 
require far too much computing time and were often 
unreliable due to their high dependence on initialization 
values. 

We hence developed our own fast CKE parametriza- 
tion algorithms to optimize the quality of fit between 
CKE models and microarray profiles data. This optimized 
quality of fit, adequately balanced by a systematic empha- 
sis on parsimoniously parameterized models, becomes an 
essential clue to decide which potential interactions one 
should validate between miRNAs, mRNAs, and associated 
proteins. 

Parameters parsimony requirement 

Robustness of CKE parametrization is the main motiva- 
tion for our parameter parsimony requirements. Consider 
CKE (1) modeling a TD-motif TDM involving riact acti- 
vators and Hrep repressors for the transcription of mRNA 
gene G, and one miRNA M degrading the transcription 
of G. Then the number ripar of unknown parameters is 
npar = 3 + 2(nact + J^rep)' Each profile g(t),m(t),ai(t), 
rj(t) is recorded at ^ = ti, . . .,tp. The CKE outputs for 
TDM are g{ti), . . .,g{tp). Hence each microarray data set 
provides {p — 1) equations linking the ripar unknown 
parameters, since the recorded g{t) should be very close 
to the predicted values g{t) obtained by solving the ODE 
(1) with initial value ^(^i). 

Vapnik's results on model fitting (see [36]) show that 
robust accuracy of parameter estimates requires fairly 
high values of the ratio oi{p — l) / ripar ^ So ripar « 0^ — 1) 
is a necessary constraint, and we will impose the parame- 
ter parsimony requirement Upar S ip — ^) /4. Indeed when 
yipar > ifp — 1)> CKE models are overfitted and parameters 
are poorly estimated. 



CKE Parameter estimation 

Tl-motifs: plausible ranges for parameters 

Consider a generic TI motif TIM involving an mRNA gene 
G, its associated protein P, and k translation inhibiting 
miRNAs [Mi, . . ..M/^]. Let p(t)yg(t), mtif) be the expres- 
sion levels of G,Mi. We want to model TIM by CKE 
(4). 

Parametrized by the vector 
w = [y, A, Ml, Si, ... , Sj^] 

which involves (2k + 2) < 8 parameters. 

The degradation and translation rates y and k of protein 
P were unknown for most proteins. 

According to results from [31], only a small percent- 
age 2% of our 30,000 mRNAs have more than 2 potential 
binding sites for miRNAs, and only 0.02% mRNAs have 
as many as 7 such binding sites. So in our parameter esti- 
mations it is reasonable to restrict the number of binding 
sites Sj for miRNA Mj to be at most 5. 

Tl-motifs: optimizing the quality of fit 

The inputs of CKE (4) are the initial value p(ti) and the 
expression levels g(t), mi(t), . . . , m/((t) recorded at time 
dates tif . . .ftq, 
Discretizing the ODE (4) at time ^i, . . . , we get 

p(tj^i) - p(tj) = -yp(tj) + Xg(tj)H(tj) (13) 

where the percentage H(t) is as recalled above . By sum- 
mation this implies, for / = 2, 3, , ^ — 1 , the relation 

Pitj^i) -Piti) = -yQ(tj) + XL(tj) (14) 
where 

/ 

= ' ^^^/^ = SUmi^^^g{tn)H{tn) 

n=l 

In view of equation (14), when all expression profiles 
involved have been recorded until time tj, one can predict 
the still unknown value of pitj^i) by the following natural 
estimator p(^y+i), 

pitj^i) = piti) - yQ(tj) + XL(tj) (15) 

The quality of fit of this CKE model with recorded pro- 
files data will be quantified by the size ERR(w) of the 
estimation error numerically computed as follows 

ERR(w)= max \ p(tj) - p(tj) \ (16) 

which in view of (15) can be reformulated as 

ERR(w) = max | TZj + /x/y — VjX \ (17) 

J=1,...,T ^ ^ ^ 
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where for / = 1, . . . , ^ we have set 

=Pifj) -p(ti)fMj = Q(tj)vj = L{tj) 



(18) 



We seek a parameter vector w minimizing the cost func- 
tion ERR(w) in the parametric domain defined by 

y >0;X>0;Ui>0;l<Si<5 

Tl-motif: parameter estimation algorithm 

For each / = 1 fix an arbitrary integer 1 < Si < 
5. Call fhi and Hi the respective medians over time t of 
the functions mi(t) and Hi(t), Since the function Hi(t) = 
1/[1 + Uimi(t)]^' is monotonous in niiif), we have 



Hi = + 



(19) 



The assumption 0 < Hi{t) < 1 yields 0 < Hi(t) < 1. We 
will discretize the possible values of Hi(t) by constraining 
them to belong to a grid /zi, . . .,/z5 of 1 < 5 < 99 per- 
centage values equally spaced in [0, 1]. Since fhi is known, 
we invert equation (19) to compute a corresponding grid 
GRDi of 5 potential values for Ui = wS^W-^^^^' ~ 

Now for 1 < i < ky select and fix arbitrary values Ui in 
the grid GRDi. After selecting as restrictive above the set 
of parameter vector U = [ui,Si, . . . Uk,Sk\, the function 
H{t) is then completely determined for all t. 

Note that the set U of all possible such choices for U is 
of cardinal inferior toN = (5s)^. Since we do explore each 
one of these possibilities separately, we need to keep the 
number A/^ at a reasonable level, so we selected 5 = 99 for 

= 1 , 5 = 60 for = 2, 5 = 20 for = 3 etc. 

Fix any U in Since all recorded profiles involved in 
the TD motif are available at time ^i, . . . , we can use the 
U value to directly compute all the values H(tj), and then 
all the numbers tt/, /xy, vj defined by (18). We want to find 
values of the two last parameters y > 0 and k > 0 which 
will minimize 

ERRiy^ X) = max | ttj -\- fjijy — vjX \ 

i=l,...,q 

This problem is equivalent to minimizing the linear objec- 
tive function: 

^{y,X,z) = z 

under the (2q + 3) linear inequality constraints 

y >0; X>0; z>0 
z — (ttj + fijy — VjX) > 0 foicj = 1, . . . ,q 
z + (ttj + ixjy — VjX) > 0 for ; = I, . . .,q 

This is a classical constrained linear programming prob- 
lem, which can be solved by well known fast linear pro- 
gramming algorithms [37], to provide the optimal values 
7*, A.*, Then z* = £i?i?(y *, A*) is the minimal value of 
ERR{y,X). 



These optimal values are functions y'^iU), X*(Zi), z'^iU) 
of the partial vector of parameters U in U. We then select 
the optimal U* in U as the value of U which minimizes 
z'^iU) over U, The optimal parametrization w* of our TD 
motif is then given by w* = [y*(l/*), A*(^*), U% This 
new parametrization algorithm is fairly fast and has good 
accuracy. On a current laptop PC, our non-optimized 
MATLAB code implementation required less than 5 min- 
utes of CPU time for the parametrization of a typical 
Tl-model with 38 time points and 9 parameters [38]. 
After code optimization and a re-implementation in C, we 
expect this CPU time to be reduced to 2 minutes. The 
algorithm does not require any knowledge of the parame- 
ters ranges except for the number of binding sites <S, which 
is an advantage for the range of reaction rates of of many 
molecules of interest are usually unknown. 

TD-motifs: parameter estimation algorithm 

The parametrization algorithms just presented also apply 
to TD models, as we now sketch. The output variable is 
now the expression level ^(0 of mRNA gene G. 
Discretize the ODE (1) at time dates ^i, . . . , to get 

g(tj+i) - gitj) = -Pgitj) - vg(tj)m(tj) + KF(tj) (20) 

where F(t) is defined in (1). By summation this implies the 
relations 

gitj^i) - g(h) = -mtj) - vV{tj) + KK{tj) (21) 
where 

/ ; 
^(^;) = I] g^^n) ; V{tj) = J2 g(fn)m{tn) ; 

n=l n=l 

K{tj) = suni^^^ F(tn) 

When all expression profiles involved have been recorded 
until time tj, one can predict the unknown value of^(^y+i) 
by the estimator J^(^y+i) , 



g(tj+i) = g(ti) - mtj) - mtj) + KK(tj) 



(22) 



The quality of fit of this CKE model with the recorded 
profiles data is quantified by the size of the prediction error 
which is a function ERR(w) of the parameter vector w 

ERR(w)= max | g(t|) - g(t|) | 

j=l,...,q 

SO that 

ERR(w)= max \Tj -\- pjp rj^v - OjK \ (23) 

where we have set 

rj=g(tj)-g(ti) Pj=B(tj) rij = V(tj) Oj=K(tj) 

The TD model parameters SRi > 0, Ui > 0 and SAj > 
0, wj > 0 are the number of binding sites and the affinity 
constants for the transcriptional factors Ri and Aj, They 
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constitute a partial parameter vector U which we restrict 
as above by first imposing a moderate upper bound Smax 
on all the integers SRi^ SAj. and by selecting, also as done 
above, adequate finite grids for the values of the Ui and 
the Wj. This constrains U to belong to a finite set U. The 
cardinal N of is forced to remain at most of order 10^, 
by adequate constraints on S^nax ^nd on the coarseness of 
the Ui grids and the vj grids. 

To minimize ERR(w) , we fix as above an arbitrary U 
in U, We can then compute all the numbers ry, p/, r]j, Oj . 
Since U is fixed, the error size ERR(w) becomes a func- 
tion E(PfVfK) of the last 3 positive parameters (yS,v, /c), 
still given by equation (23). As above the minimization of 
E(fi, V, /c) is equivalent to a constraint linear programming 
problem where we want to minimize the linear function 
^(P,v,K,z) = z over the following set of (2q + 4) linear 
inequalities 

P > 0; V > 0; K > 0; z > 0 
z — (tj + pjfi + r]jv — Ojk) > 0 for/ = 1, . . . , ^ 
z + (tj + pjP + rjjV - Ojk) > 0 for; = 1, . . . , ^ 

Solving this constraint linear programming problem 
generates optimal parameters (^*,v*,/c*) and a minimal 
error z*, which are all functions of U. One concludes as 
above by selecting an optimal Z/* minimizing (U) over 
alia in ZY. 

Quality of fit for CKE models 

Consider any TD motif or TI motif A. We have seen how 
to compute a parameter vector w* optimizing the quality 
of fit between microarray data and our CKE model for A. 
This was done by minimizing ERR(w) = max^ \f(t) — 
fit) I, where/(0 is the main output variable of A and/(^) 
is the estimation oif{t) based on the CKE parametrized 
by w. 

For the optimal CKE parametrization w*, the lack of fit 
to data can then be evaluated by ERRiyf""). However we 
have seen in section Invariance by affine profile trans- 
formations' that when comparing two expression profiles 
recorded by microarray, natural distances between pro- 
files should be roughly invariant by changes of scale for 
these profiles. It would then be tempting to replace the 
absolute error of estimation \f{t) — f(t) \ at time t by the 



SRE{t) 



relative error of estimation 



V{t)-f{t)\ 

m 



. But relative errors 



become quite large whenever the output profile /((O is 
close to zero. To avoid such spuriously large error val- 
ues, while still preserving scale invariance whenever /(^) 
is not close to zero, we define the Smoothed Relative Error 
of estimation SRE(t) at time t by the following formula, 
where/ denotes the mean value of the profile / 



SRE(t) = 



\f(t) -f(t)\ 
fit) 



fit) -f(t)\ 
f 



when f(t) < 0.15/ 



(25) 



We finally quantify the Modeling Error MODER for the 
optimally parametrized CKE model of motif A by 



MODER = m2ixSRE(t) 

t 



Results and discussion 

Examples of application 

We implemented our model of TI on microarray data 
of mouse stem cells undergoing RA-induced differen- 
tiation, as provided by LC Science Inc, and previously 
analyzed by classical techniques in [18]. We took the 
recorded expression profiles for proteins/mRNAs GCNF, 
Oct4, Nanog and Sox2 at time points (0, 1.5, 3, 6)/(0, 3, 
6) and expression levels for 266 miRNAs on days 0, 1, 3, 
6 from [38] during ES cell differentiation. These profile 
data were interpolated at 19 intermediary time points, by 
Piecewise Cubic Hermite Interpolation (PCHIP) and the 
number of parameters were limited to be 4, i.e. only 1 
upstream miRNA was selected for the model, to satisfy the 
parameter parsimony requirement. 

For miRNA Mu the following linear transformation, 
which could be viewed as normalization, was done: 



mif) 



YYliit) - YHiit) 
oiYHi) 



+ 1 



when/CO > 0.15/ 



(24) 



where oimi) = ^E^C^KO - m(t))^, t = 0, 1/3, . . . , 6. 
Since || "^'^^^^^f^ II < 1, mit) is positive for t = 
0,1/3,..., 6. Taldng s = 0.15, we applied Minimal 
Net clustering (the MATLAB code can be downloaded 
through Additional file 1) to the transformed data of miR- 
NAs, mit), i = 1, 2, ... , 266, and obtained 107 clusters, 
in which the maximum cluster contains 14 miRNAs. We 
consider that the miRNAs belonging to the same cluster 
share the same normalized expression level within a neg- 
ligibly small error. For each cluster CLjJ = 1, . . . , 107, we 
determine the miRNA MCj which is the representative of 
the cluster CLj for the distance D , and we let mcj(t) be 
the expression level of MCj at time t We call mcj(t) is the 
expression level of cluster CLj. 

We successively implemented the TI model with only 
one repressor miRNA (the MATLAB code can be down- 
loaded through Additional file 2). After parametric mod- 
eling of our pre-selected 107 TI motifs, and evaluation 
of their quality of fit, we have validated only 3 clusters 
of miRNAs as translational inhibitors repressing protein 
Oct4. If an miRNA in CLj is validated by modeling as a 
potential repressor, all other miRNAs belonging to CLj 
are also potential repressors and can be validated numer- 
ically as well by the form invariability of the model under 
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affine transformation. Here we present one of these three 
vaUdated miRNAs of Oct4 (see Figure 2), where the cen- 
troid of the cluster is miRNA mmu-miR-lOa, while the 
other 6 miRNAs in the cluster are mmu-miR-203, 330, 
342, 470 and 99b. We used TarBase [39] to search all the 
experimentally validated miRNAs targeting Oct4 (PouSfl) 
in Mus musculus. By TarBase, only mmu-miR-470 has 
been experimentally validated and it is also numerically 
validated by our TI model. 

For protein GCNF, only two miRNAs, mmu-let-7b and 
mmu-let-181a, have been experimentally validated by Tar- 
Base and both of them belong to the list of the 20 miRNAs 
numerically validated by our TI modeling (we presented 
the validated cluster containing mmu-let-7b in Figure 3). 
Our modeling approach did not validate any miRNA 
repressing both proteins Nanog and Sox2, while there 
are 3 miRNAs and 1 miRNA experimentally validated as 
separate repressors of Nanog and Sox2 respectively. 

We here also present one example of TD motif for 
downstream factor mRNA Sox2 (Figure 4). With the 
assumption that the transcription factors are proteins 
Oct4 and Nanog [38], we validated cluster mmu-miR- 
134, 30a-3p, 30b, 335, 431, 433-3p, 434-3p, and 487b 
as degraders. Although (mmu-miR-134, Sox2) is also an 
experimentally tested pair, we will not discuss deep in 
detail the validation results of the TD motifs in this 
paper. The main reason is that the validation results 
of TD motifs depends much on our knowledge of the 



transcription factors. More discussion is in the subsection 
below. 

Discussion 

In [38], we pre-selected the potential miRNAs for each 
gene/protein by TargetScan 5.0 and miRanda before 
applying the two models. The pre-selection of miRNA 
candidates were not necessary though it greatly reduced 
the computational cost. However, the application of the 
CKE modeling was dependant on the target prediction 
algorithm, such as TargetScan or miRanda. Therefore, 
we introduced Minimal Net Clustering in this paper so 
that the data was condensed and the computational cost 
could be reduced by a purely numerical method without 
biological bias. 

Since the results of TD model depend on information 
of transcription factors, the modeling validates not only 
miRNAs acting as mRNA degraders but also upstream 
transcription factors simultaneously. In [38] we numeri- 
cally validated proteins GCNF, Oct4 and Nanog as tran- 
scription factors for mRNAs Oct4 and Nanog, while Mar- 
son et al. and Boyer et al. [40,41] claimed that Oct4 
and Sox2 are bounded and act together with Nanog as 
transcription factors. Considering the impact of the tran- 
scription factors, the TD modeling may be less convincing 
unless the transcription factors are fixed as experimentally 
validated. In this paper we presented more details on the 
implementation and validation results of the TI model in 



inhibitor cluster: miR-10a, 203, 330, 342, 470, 99b 




IVIODER=0.09 

Figure 2 Example of validated TI motif. Example of TI motif repressing Oct4. All expression profiles are over days 0-6. Upper 6 profiles: normalized 
expression levels of numerically validated miRNAs in the same cluster: mmu-miR-1 Oa, 203, 330, 342, 470 and 99b. Bottom 2 profiles: Blue line = 
recorded levels. Red dash line = predicted levels. "MODER" is the model global relative error of prediction. 
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inhibitor cluster: let-7b,let-7g,let-7i, miR-106b, miR-122a, miR-98 




2 3 4 

protein GCNF 




MODER=0.10 

Figure 3 Example of validated Tl motif. Example ofTI motif repressing GCNF. All expression profiles are over days 0-6. Upper 6 profiles: 
normalized expression levels of numerically validated miRNAs in the same cluster: mmu-let-7b, let-7g, let-7i, miR-1 06b, miR-1 22a, miR-98. Bottom 2 
profiles: Blue line = recorded levels. Red dash line = predicted levels. "MODER" is the model global relative error of prediction. 



degrader cluster: miR-1 34, 30a-3p, 30b, 335, 431 , 433-3p, 434-3p, 487b 




mRNA Sox2 



8000 




MODER=0.03 

Figure 4 Example of validated TD motif. Example ofTD motif repressing Sox2. All expression profiles are over days 0-6. Upper 8 profiles: 
normalized expression levels of numerically validated miRNAs in the same cluster: mmu-miR-1 34, 30a-3p, 30b, 335, 431 , 433-3p, 434-3p, 487b. 
Bottom 2 profiles: Blue line = recorded levels. Red dash line = predicted levels. "MODER" is the model global relative error of prediction. 
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order to focus on the validation of miRNAs and avoid the 
influence of assumptions of transcription factors. 

TarBase shows that, for protein GCNF the experi- 
mentally validated miRNAs are mmu-let-7b and mmu- 
miR-181a; for protein Oct4 the experimentally validated 
miRNA is mmu-miR-470; for protein Nanog the exper- 
imentally validated miRNAs are mmu-miR-134, mmu- 
miR-470, and mmu-miR-296; for protein Sox2 the experi- 
mentally validated miRNA is mmu-miR-134. In this paper, 
we have validated by TI modeling and minimal net clus- 
tering all the experimentally tested miRNA repressors of 
GCNF and Oct4. In our previous work [38], actually none 
of these miRNA repressors had yet been studied for mod- 
eling for the four proteins because of the restriction of 
pre-selection. And only the pair (mmu-miR-lSla, GCNF) 
was validated by the classical correlation analysis done in 
Gu et al. [18] for the 4 proteins GCNF, Oct4, Nanog, Sox2. 
Therefore, the TI modeling combined with data con- 
densation not only reduced computational cost but also 
clearly extended the set of miRNA inhibitors validated by 
model fitting to microarray data. 

Since each numerically validated miRNA cluster may 
contain two or more miRNAs, the miRNAs in the same 
cluster could also be considered as potential candi- 
date inhibitors for further experiments to validate. For 
instance, as Figure 2 shows, mmu-miR-203, 330, 342, 10a 
and 99b are potential candidates for protein Oct4 for 
they are in the same cluster as mmu-miR-470, which is 



validated by both the numerical modeling and experi- 
ments. 

After the pair (mmu-miR-181a , GCNF) was well vali- 
dated by our CKE model, we found that the cluster (see 
Figure 5, top left) containing mmu-miR-181a also includes 
mmu-miR-103 and mmu-miR-107, which are two known 
miRNAs that have the same roles in regulating insulin 
sensitivity and promoting metastasis of colorectal cancer 
[42,43]. We also checked that miR-103 and miR-107 have 
almost the same mature sequences: 

mmu-miR-103: AGCAGCAUUGUACAGGGCUAUGA 
mmu-miR-107: AGCAGCAUUGUACAGGGCUAUGA 

After the pair (mmu-let-7b, GCNF) was well validated 
by our CKE model, we observed mmu-let-7g, and mmu- 
let-7i are in the same cluster as mmu-let-7b (see Figure 5, 
top right). It was claimed that let-7b and 7g reduce tumor 
growth in mouse models of lung cancer [44]. We then 
checked that indeed these three miRNAs have very similar 
mature sequences, namely 

mmu-let-7b: UGAGGUAGUAGGUUGUGUGGUU 
mmu-let-7g: UGAGGUAGUAGUUUGUACAGU 
mmu-let-7i: UGAGGUAGUAGUUUGUGCUGU 

To evaluate the correlation between mature sequence 
and expression profile of our set of 266 miRNAs, we 
systematically explored all the 266 x 265/2 = 35,245 
pairs (m/r/, mirj) of distinct miRNAs in this set, / 7^ 
= 1, . . . , 35245. For each such pair we then computed 



mmu-miR-100, 103, 107, 10b, 125b, 145, 181a mmu-let-7b, 7g, 7i, 106b, 122a, 98 
1.61 ' ' ' 1 1.81 ' ' ' 1 




quantile quantile 

Figure 5 miRNA clusters and analysis of mature sequences and expression levels. Top left, expression profiles for a cluster of 7 miRNAs. Top 
right, expression profiles for a cluster of 6 miRNAs. Bottom left, the blue curve is the quantile curve of expression distances for miRNA pairs with high 
alignment scores {NWA > 1 0), the red curve is the analogous quantile curve for miRNA pairs with low alignment scores {NWA < 1 0). Note that the 
blue curve lies below the red curve indicates that when NWA increases, then expression distance tends to stochastically decrease. Bottom right, 
similar to bottom left but the NWA threshold changes to NWA = 1 5. 
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the Euclidean distance between the expression level pro- 
files (expression distance in short) and the Needleman- 
Wunsch alignment score (NWA) between the mature 
sequences of each miRNA pair. We then divided the 
35,245 miRNA pairs into two groups: GroupHigh includes 
the pairs with high alignment score {NWA > 10), i.e. 
GroupHigh includes miRNA pairs with similar mature 
sequences, and GroupLow includes the pairs with low 
alignment score {NWA < 10). We then compared the 
distribution of expression distances for all miRNA pairs 
in GroupHigh with the distribution of these distances for 
miRNA pairs in GroupLow. As seen in Figure 5 (bot- 
tom left), the quantiles of these distances in GroupLow 
are consistently larger than the corresponding quantiles 
in GroupHigh. This is fully confirmed by Kolmogorov- 
Smirnov test which yielded the very significant p-value 
7 X 10~^^. The result still holds when we change the 
alignment score threshold NWA = 10 used to define 
GroupHigh and GroupLow (see Figure 5, bottom right. 



where the NWA threshold is now NWA = 15). We 
conclude that miRNAs having similar mature sequences 
tend, with high probability, to have similar expression 
levels. The above analysis and examples indicate that 
miRNAs belonging to the same cluster are good candi- 
dates to have similar mature sequence. Since the match 
between miRNA mature sequence and target sites is the 
main determinant for miRNA targets, miRNAs belong- 
ing to same cluster may hence also have similar regulatory 
roles. 

Robustness of parameter estimation 

Estimation algorithms for nonlinear models may yield 
parameter estimates that are dependent on the particu- 
lar set of data or on initial estimates of parameters. Since 
our parameter estimation algorithm is independent from 
initial estimates of parameters, we now focus on on the 
measurements errors affecting microarray data and on 
their impact for parameter estimation. Considering that 
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Figure 6 Histogram of re-estimated parameters for a TD model, a) the histogram of re-estimated parameter ^, degradation rate of Sox2, the 
originally estimated ^ value of the model is below the graph, b) the histogram of re-estimated parameter /c, transcription rate of Sox2, the originally 
estimated k value of the model is below the graph, c) the histogram of re-estimated parameter v, reaction rate of Sox2 and mmu-mir-21,the 
originally estimated \/ value of the model is below the graph, d) the histogram of re-estimated parameter reaction constant of Sox2 and Oct4, 
the originally estimated /xi value of the model is below the graph, e) the histogram of re-estimated parameter /X2, reaction constant of Sox2 and 
Nanog, the originally estimated /X2 value of the model is below the graph, f) the histogram of re-estimated parameter Si, S2, number of binding 
sites of Oct4 and Nanog on Sox2 respectively, the originally estimated Si and S2 value of the model is below the graph. 
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the noises of the microarray data are not negligible, we 
have analyzed the robustness of our parameters estima- 
tors when one perturbs randomly the observed expression 
levels of miRNAs. We have selected an arbitrary vali- 
dated model MD, where the corresponding downstream 
factor is denoted as A and has expression levels d{t). 
Denote miRNAs Mi,..., My pertaining to model MD 
and call their expression levels mi(t), . . mjif). Then we 
have perturbed the expression levels of the miRNAs by 
independent random noises having the recorded stan- 
dard deviation Gi{t)y . . . ■,Gj{t) and obtained simulated 
expression levels smi(t), . . . ,smj(t). After injecting the 
perturbed expression levels smi(t), . . . ,smj(t) into the 
model MD and get the predicted expression level pd(t) 
ofD. With pd{t)y mi{t)y...y mj{t) and expression levels of 
other upstream factors, we then applied our parameter 
estimation algorithm to re-estimate the model parame- 
ters. This procedure was repeated 100 times. Then for 
each model parameter, we plotted the histograms of those 
100 re-estimated parameter values and compared them 
with the parameter values estimated from unperturbed 
data of model MD. This analysis showed that our param- 
eter estimation algorithm is quite robust. Here we present 
the histograms of perturbed estimates of model param- 
eters for one TD motif (Figure 6) and one TI motif 
(Figure 7). 



Conclusion 

We have separately modeled by chemical kinetics 
equations the 2 distinct modalities of the repressive 
actions of miRNAs on post-transcriptional processes of 
mRNA genes and the associated proteins. This was 
achieved by first defining the formal structure of two types 
of interaction architectures (Transcription Degradation 
motifs and Translation Inhibition motifs ) linking miRNAs 
to subgroups of mRNA genes. The plausibility of each 
one of these potential TD motifs or Tl-motifs was then 
evaluated by computerized parametric modeling, based 
on microarray data, of adequate formal chemical kinetics 
equations (CKEs). 

We have sketched the formal derivation of 2 spe- 
cific CKEs modeling by dynamic ODEs the interactions 
between concentrations of different species of molecules 
involved in each architecture. This led to a motif valida- 
tion strategy based on the quantified quality of fit between 
our optimally parametrized models and the correspond- 
ing microarray data. 

Our computerized parameter estimation is imple- 
mented by an innovative fast algorithm that does not 
require knowledge of range of molecular reaction rates. 
On a current standard laptop PC, our implementation 
of parameter estimation for a typical 9-parameters CKE 
model requires about 5 minutes of computing time. 
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c) histogram of ji 
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Figure 7 Histogram of re-estimated parameters for a TI model, a) the histogram of re-estimated parameter y, degradation rate of GCNF, the 
originally estimated y value of the model is below the graph, b) the histogram of re-estimated parameters, translation rate of GCNF, the originally 
estimated X value of the model is below the graph, c) the histogram of re-estimated parameter ii, reaction constant of GCNF and mmu-let-7b, the 
originally estimated /x value of the model is below the graph, d) the histogram of re-estimated parameters, number of binding sites of mmu-let-7b 
on GCNF respectively, the originally estimated 5 value of the model is below the graph. 
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Our parameter estimation algorithm also provides rela- 
tively high-quality optimization for the fit between model 
and microarray data, by integrating both global and local 
cost minimization techniques, in contexts where plausible 
ranges of values for most of the unknown parameters are 
not available in the literature. By perturbing the expres- 
sion levels of miRNAs and re-estimating the parameters, 
we showed that our parameter algorithm has a satisfactory 
level of robustness. We believe that our parameter estima- 
tion technique with associated evaluation of quality of fit 
would be quite applicable as a generic algorithm to simi- 
lar problems in chemical kinetics modeling of molecular 
interactions. 

Modeling very large microarray data is computationally 
quite expensive. We have hence sketched clustering meth- 
ods to condense large microarray data. This approach has 
of course been attempted before our work, but the main 
point is that we have carefully studied the mathematical 
compatibility of our CKE models with condensation of 
the profiles data. Since we have proved that the abstract 
form of our CKE models is invariant by arbitrary multiple 
affine transformations of profiles data, we have made sure 
to constrain the distance of two expression levels profiles 
to be invariant by these types of affine transformations. 

We have implemented a Minimal Net Clustering algo- 
rithm based on this distance, which allows us to control 
the radius of the clusters. The number of CKEs to param- 
eterize can be strongly reduced after condensation of the 
large data sets, and the affine invariance of our CKEs 
show that the condensed genes network can then still be 
modeled by similar CKEs. 

By applying our TI modeling to multiple proteins such 
as GCNF, Oct4, Nanog, Sox2, we showed that 3 miRNA- 
target pairs experimentally validated can be also validated 
by the TI model. 
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